Fast Communication-Efficient Spectral Clustering over Distributed Data

نویسندگان

چکیده

The last decades have seen a surge of interests in distributed computing thanks to advances clustered and big data technology. Existing algorithms typically assume {\it all the are already one place}, divide conquer on multiple machines. However, it is increasingly often that located at number sites, wishes compute over with low communication overhead. For spectral clustering, we propose novel framework enables its computation such data, "minimal" communications while major speedup computation. loss accuracy negligible compared non-distributed setting. Our approach allows local parallel where located, thus turns nature into blessing; most substantial when evenly across sites. Experiments synthetic large UC Irvine datasets show almost no our about 2x under various settings two As transmitted need not be their original form, readily addresses privacy concern for sharing computing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Communication-Efficient and Exact Clustering Distributed Streaming Data

A widely used approach to clustering a single data stream is the two-phased approach in which the online phase creates and maintains micro-clusters while the off-line phase generates the macro-clustering from the micro-clusters. We use this approach to propose a distributed framework for clustering streaming data. Our proposed framework consists of fundamental processes: one coordinator-site pr...

متن کامل

Entropy-based Consensus for Distributed Data Clustering

The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...

متن کامل

AIDE: Fast and Communication Efficient Distributed Optimization

In this paper, we present two new communication-efficient methods for distributed minimization of an average of functions. The first algorithm is an inexact variant of the DANE algorithm [20] that allows any local algorithm to return an approximate solution to a local subproblem. We show that such a strategy does not affect the theoretical guarantees of DANE significantly. In fact, our approach...

متن کامل

An Efficient Distributed Data Clustering Algorithm

The k-means algorithm is one of the most popular clustering algorithms in use today. The high running time complexity of serial k-means limits its applicability for very large databases. On the other hand, the existing parallel kmeans algorithms demand huge data transfer operations incorporating high communication complexity. Transfer of actual data from local sites is also unacceptable, in man...

متن کامل

General and Robust Communication-Efficient Algorithms for Distributed Clustering

As datasets become larger and more distributed, algorithms for distributed clustering have become more and more important. In this work, we present a general framework for designing distributed clustering algorithms that are robust to outliers. Using our framework, we give a distributed approximation algorithm for k-means, k-median, or generally any `p objective, with z outliers and/or balance ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Big Data

سال: 2021

ISSN: ['2372-2096', '2332-7790']

DOI: https://doi.org/10.1109/tbdata.2019.2907985